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AB Genome sequencing projects are rapidly identifying all of the genes in 
several organisms. The products of these genes are widely recognized as 
the next generation of therapeutics and targets for the development of 
pharmaceuticals. While identification of these genes is proceeding 
quickly, elucidation of their ***thrce**» •♦•dimensional*** (3D) 
structures and biochem. functions lags far behind. In some cases, 
knowledge of3D structures of ♦• ♦proteins* ** can provide important 
insights into structural homo), that is not easily recognized by sequence 
alignment comparisons. Thus, anal, of a ••♦protein*** 's 3D structure 
by ♦•♦NMR^** or X-ray crystallog. prior to characterization of the 

•♦♦protein*** 's biochem. ♦♦♦function*** can sometimes provide key 
information regarding •♦♦protein**^ fold class, locations and 
clustering of conserved residues, and surface elearostatic field 
distributions. This information can be used to develop hypotheses 
regarding potential biochem. functions, and the resulting limited set of 
putative biochem. functions tested by appropriate biochem. assays. 

***NMR*** chem. shifl assignments and soln. structures of 

* • *proteins** • also provide the basts for epitope-mapping, mol. 
dynamics, and SAR studies, and set the stage for subsequent drug 
development using combinatorial and/or rational design methods. We are 
developing technologies that will significantly accelerate the process of 
structure detn. by ♦♦♦NMR*** . These include bioinformatics methods 
for ♦**parsing**^ novel genes into domain encoding regions, high-level 
"multiplexed" ♦♦♦protein^^^ expression systems, and ♦♦♦NMR*** 
pulse sequences, data collection methods, and expert-system software for 
automated anal, of * * * protein* • * resonance assignments and 3 D 
structures. These technologies and the resulting exptl. data are being 
organized and integrated using relational databases. The goal of this 
work is to develop a "high-throughput" process for structural anal, of 
novel gene products on a genomic scale. In a pilot project, these 
techniques arc being applied to clusters of orthologous genes coding for 

***proteins*^^ of unknown structure and ♦•♦function*** .with the aim 
of testing the hypothesis that 3D structural anal, can sometimes provide 
useful and important clues regarding the biochem. functions of orphan gene 
products. The relationship of our effort and the emerging international 
interest in a large-scale Human Proteome Project will be discussed. 
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US 1998-181601 19981029 
AB The present invention provides a * * *structure* * * - * * *functional* * * 
anal, engine for the high-throughput detn. of the biochem. 

** 'function*** of ***proteins*** or ***protein*** domains of 
unknown **'function"* . The present invention uses bioinformatics. 
mol. biol. and ***NMR*** tools for the rapid and automated detn. of 
the ***three*** - ***dimensional*** structures of *♦ ♦proteins*" 
and ***protein*** domains, 
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AB Genome sequencing projects are rapidly identifying all of the genes in 
several organisms. The products of these genes are widely recognized as 
the next generation of therapeutics and targets for the development of 
pharmaceuticals. While identification of these genes is proceeding 
quickly, elucidation of their ***three*** ***dimensional*** (3D) 

***structures*** and biochem. ***functions*** lags far behind. In 
some cases, knowledge of 3D structures of ***proteins*** can provide 
important insights into structural homol. that is not easily recognized by 
sequence alignment comparisons. Thus, anal, of a ***protein*** 's 3D 
structure by ***NMR*** or X-ray crystallog. prior to characterization 
of the ***protein*** 's biochem. ** 'function*** can sometimes 
provide key information regarding "*protein'*' fold class, locations 
and clustering of conserved residues, and surface electrostatic field 
distributions. This information can be used to develop hypotheses 
regarding potential biochem. functions, and the resulting limited set of 
putative biochem. functions tested by appropriate biochem. assays. 

***NMR*** chcm. shift assignments and soln. structures of 

* * *proteins* * * also provide the basis for epitope-mapping. mol. 
dynamics, and SAR studies, and set the stage for subsequent drug 
development using combinatorial and/or rational design methods. We are 
developing technologies that will significantly accelerate the process of 
structure detn. by ***NMR*** . These include bioinformatics methods 
for parsing novel genes into domain encoding regions, high-level 
"multiplexed" •**protein*** expression systems, and ***NMR*** 
pulse sequences, data collection methods, and expert-system software for 
automated anal, of ** •protein*** resonance assignments and 3D 
structures. These technologies and the resulting exptl. data are being 
organized and integrated using relational databases The goal of this 
work is to develop a "high-throughput" process for structural anal, of 
novel gene products on a genomic scale. In a pilot project, these 
techniques arc being applied to clusters of orthologous genes coding for 

***proteins*** of unknown ***structure*** and •"function*" . 
with the aim of testing the hypothesis that 3D structural anal, can 
sometimes provide useful and important clues regarding the biochem, 
functions of orphan gene products. The relationship of our effon and the 
emerging international interest in a large-scale Human Proteome Project 
will be discussed. 
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AB The practical exploitation of the vast numbers of sequences in the genome 
sequence databases is crucially dependent on the ability to identify the 
•♦♦function*** of each sequence. Unfortunately, current methods, 
including global sequence alignment and local sequence motif 
identification, are limited by the extent of sequence similarity between 
sequences of unknown and known ***function*** ; these methods 




increasingly fail as the sequence identity diverges into and beyond the 
twilight zone of sequence identity. To address this problem, a novel 
method for identification of ***protein*** ♦♦♦function*** based 
directly on the sequence-to- •♦♦structure*** -to- ***fiinction*** 
paradigm is described. Descriptors of ***protein*** active sites, 
termed "ftizzy functional forms" or FFFs, are created based on the geometry 
and conformation of the active site. By way of illustration, the active 
sites responsible for the disulfide oxidoreductase ** ♦activity* ** of 
the glutaredoxin/thioredoxin family and the RNA hydrolytic 
••♦activity*** of the TI ribonudease family are presented. First, the 
FFFs are shown to correctly identify their corresponding active sites in a 
library of exact ♦ * * protein* • ♦ models produced by crystallography or 

* ♦ *NMR* * * spectroscopy, most of which lack the specified 

* ♦♦activity*** . Next, these FFFs are used to screen for active sites in 
low-io-moderate resolution models produced by ab initio folding or 
threading prediction algorithms. Again, the FFFs can specifically identify 
the fiinctional sites of these ♦♦ ♦proteins* ** from their predicted 
structures. The results demonstrate that low-to-moderate resolution models 
as produced by state-of-the-art ♦•♦tertiary*** ♦•♦structure*** 
prediction algorithms are sufficient to identify ••♦protein**^ active 
sites. Prediction of a novel ♦•♦fiinction^** for the gamma subunit of a 
yeast glycosyl transferase and prediction of the ♦♦ •function* *♦ of two 
hypothetical yeast ♦ ♦ ♦proteins* ♦ ♦ whose models were produced via 
threading are presented. This work suggests a means for the large-scale 
functional screening of genomic sequence databases based on the prediction 
of structure from sequence, then on the identification of functional 

active sites in the predicted structure. Copyright 1998 Academic Press 
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AB Since the 1980's, structural studies of ♦•♦proteins^** have changed 
remarkably. It is currently possible to predict the entire amino acid 
sequence of a •♦•protein^* ♦ by the rapid and highly sensitive analysis 
of the nucleotide sequence of genomic DNA or cDNA encoding the 

* * ♦ protein * * * . In the near aiture, the entire sequence of a 

♦ ♦* protein ♦ ♦ * may be predicted from a partial sequence just by searching 
a variety of databases now being constructed for many biological species. 
The predicted * * *protein* * * sequence, however, is the backbone 
structure of the precursor ♦♦♦protein*** without post-translattonal 
modifications. Therefore, the major objectives of recent structural 

studies of ♦♦♦proteins^^^ are directed to 1) rapid and sensitive 
confirmation of the predicted sequence and identification of those 
modifications present in mature ♦**proteins*** by newly developed mass 
spectrometry, 2) determination of the 3D structures of intact and mutant 

♦♦♦proteins** ♦ isolated or expressed in cultured E. coli, yeast or 
animal cells using X-ray crystallography or ♦♦♦NMR^** analysis, and 3) 
rapid prediction of the 3D structures of ♦ ♦♦proteins* ♦♦ utilizing 

•••protein*** databases. The "PROTEOME" project was proposed in 1998 to 
bring together all the data on the •♦ ♦structure* ♦♦ and 

♦• ♦function* ♦* of mature ♦•♦proteins*** under international 
cooperation. The present paper summarizes such recent trends in 

♦**protein*** structural studies, 
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AB A review with 28 refs. An understanding of the role played by a 

•**protein^*^ in cellular ♦♦♦function*** requires a detailed picture 
of its •♦♦three^^^ - ♦♦♦dimensional*** structure as well as an 
appreciation of how the ♦♦ •structure* ** varies as a ♦♦♦function*** 
of time due to mol. dynamics. Over the past several years 
multi-dimensional, multi-nuclear soln. ♦♦♦NMR*** spectroscopy has 
become a powerful technol. for obtaining both structural and dynamic 




information on ***proteins**^ and ♦♦♦protein^^* -ligand systems. 
However, until recently the methods were limited to the study of mols. 
having mol. wts. on the order of 25 kDa or less. Recent developments 
making use of fractional or complete deuteration have increased the scope 
of structural studies by ♦♦*NMR*** and have also improved studies of 
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AB It is well established that sequence templates such as those in the 
PROSITE and PRINTS databases arc powerful tools for predicting the 
biological •♦♦fiinction**^ and ♦♦♦tertiary^^^ ♦♦♦structure*** 
for newly derived ***protein*** sequences. The number of X-ray and 

•♦♦NMR*** ♦♦♦protein**^ structures is increasing rapidly and it is 
apparent that a 3D equivalent of the sequence templates is needed. Here, 
we describe an algorithm called TESS that automatically derives 3D 
templates from structures deposited in the Brookhaven * ♦ ♦ Protein^ * * 
Data Bank. While a new sequence can be searched for sequence patterns, a 
new structure can be scanned against these 3D templates to identify 
fijnctional sites. As examples, 3D templates are derived for enzymes with 
an 0-His-O "catalytic triad" and for the ribonucleases and lysozymes. When 
these 3D templates are applied to a large data set of nonidentical 

♦♦•proteins*^^ , several interesting hits are located. This suggests that 
the development of a 3D template database may help to identify the 

♦•♦function*** of new ♦**protein*** ** •structures* ** , if 
unknown, as well as to design ***proteins*** vnth specific functions, 
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AB Human lithostathine is a 144-residue ♦♦•protein*** . expressed m 
various organs and pathologies. Several biological functions have been 
proposed for this ♦♦♦protein^** , Among others, inhibition of 
nucleation and growth of CaC03 crystals in the pancreas and bacterial 
aggregation has retained attention, because lithostathine presents high 
sequence similarities with calcium-dependent (or C-type) lectins. To study 
its ***structure*** - ♦♦•funaion^^^ relationship and compare it 
with that of C-type lectins, we have built a model for lithostathine. This 
model is derived from the only two C-type lectins of known structures: rat 
mannose binding ♦•♦protein^^^ and human E-selectin. An original 
strategy, inspired by that proposed by Havel and Snow, was designed for 
model building. We have undertaken ***NMR*** studies on the natural 

♦♦♦protein^^* . Although complete structure determination has not yet 
been achieved, the ***NMR*** studies did confirm the main 
characteristics of the model. From analysis of the proposed model, we 
concluded that lithostathine is not expected to present sugar- or 
calcium-binding properties. Therefore, the mechanisms of bacterial 
aggregation and inhibition of CaC03 nucleation and growth have not yet 
been elucidated. 
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Tl Modem ♦**NMR*^^ spectroscopy and x-ray crystallography. Different 
approaches to study the ♦♦♦structure^** and its ***fiinction**^ of 
a ♦♦♦protein^^* 

AU Tsuda, Sakae 

CS Biosci. Chem. Div., Hokkaido Natl. Ind. Res. Inst., Sapporo, 062, Japan 
SO Nippon Kessho Gakkaishi (1996), 38(1), 84-8 
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AB A review with 27 refs. The ♦♦•NMR*** spectroscopy has been utilized 
widely for a elucidation of the structural changes of ♦♦♦proteins^** 
caused by changes in pH, ionic strength, temp., and ligand concn. in soln. 
The x-ray was less utilized for these studies executable easily in soln,, 
but is utilized much for the structural detn. of a ***protein*** 



Such difference has lead to the situation where the ***NMR** * relied 
on the structure solved by x-ray and the x-ray argued its structure in 
ref. to the conformational change elucidated by •**NMR*** However, 
recent developments of * * *NMR* * • spectroscopy made it possible to del . 
the ***three*** - ***dimensional*** structure, and x-ray techniques 
has also been developed to clarify the structural changes of a 
♦•♦protein** • . This review compares the recent development of these two 
techniques, and will discuss about the future collaborating interaction 
between ***NMR*** and x-ray. 
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Tl ***NMR*** : This other method for ***protein*** and nucleic acid 

structure determination. 
AU Wuihrich. Kurt 

CS Inst. Molekularbiol. und Biophysik, Eidgenossische Technische 

Hochschule-Honggerberg, CH-8093 Zurich Switzerland 
SO Acta Crystallographica Section D Biological Crystallography, (1995) Vol. 
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AB For a quarter of a century X-ray diffraction in single crystals was unique 
in its ability to solve ***thrce*** - ♦♦♦dimensional*** structures 
of * ♦♦proteins*** and nucleic acids at atomic resolution. The 
situation changed in 1984 with the completion of a *• ♦protein* ♦♦ 
structure determination by ♦♦♦nuclear^*^ ♦♦♦magnetic*** 
**^resonance^^* ( ♦•♦NMR^^^ ) spectroscopy in solution, and today 

• ♦ ♦NMR^ ♦ * is a second widely used method for biomacromolecular structure 
determination. This review describes the method of ♦♦♦NMR*^* structure 
determination of biological macromolecules, and attempts to place 

* * * NMR* * ♦ structure determination in perspective with X-ray 
crystallography. ♦ ♦ ♦NMR^ ♦ ♦ is most powerful for studies of relatively 
small systems with molecular weights up to about 30000, but these 
structures can be obtained in near-physiological milieus. The two 
techniques have widely different time scales which afford different 
insights into internal molecular mobility as well as different views of 

♦♦ *protein**^ or nucleic acid molecular surfaces and hydration. 
Generally, in addition to information on the average ♦♦♦three*** - 

♦♦♦dimensional*^^ structure, ♦♦♦NMR^^^ provides information on a 
wide array of short-lived transient conformational states. Combining 
information from the two methods can yield a more detailed insight into 
the structural basis of ♦♦♦protein^^^ and nucleic acid functions, and 
thus provide a more reliable platform for rational drug design and the 
engineering of novel ***protein*** functions. 
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Tl ♦♦♦Structure^^^ and ♦♦♦function^** of ♦♦♦protein*^^ modules 
AU Go. Mitiko 
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AB Globular ♦♦♦proteins^** arc decompd. into several compact modules. 
Modules consist of about 10-40 contiguous amino acid residues and module 
boundaries are correlated with intron positions of genes. To clarify 
physico-chcm, basis of modules as structural units of ♦♦♦proteins^^^ . 
the authors synthesized module Ml of bamase, a bacterial RNase, and deid. 
its secondary structure in soln. by using ♦♦♦NMR^** technique. Ml had 
an alpha -helix at the similar location to the corresponding helix of the 
imact bamase. This result shows that the excised module has propensity 
to form similar secondary structure to those of the intact bamase. This 
propensity should be an important feature of modules advantageous as parts 
recruited into globular ♦♦♦proteins^^^ through exon shuffling in early 
evolution. To identify fimctionally important regions of large 

♦♦♦proteins^** without their ♦♦♦three^^* - ♦♦♦dimensional*** 
information, the authors applied a method for prediction of module 
boundaries to human CCG 1 . The authors obtained a close correlation 
between predicted modules and exon/intron structure of human CCGl gene. 
Predicted 1 52 modules of CCG 1 show a close con-elation with temporary 
assigned **^function^^* of CCGl. This result opens a new exptl, 
approach to det. functionally important regions of huge ♦* *proteins^^ ♦ 
; synthesis of a module or jointed modules of the ♦♦♦proteins^^^ by 
chem. method and detn. of its ♦♦♦function^*^ will be useful for 
identification of each functional region of ♦♦♦proteins^** 
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Tl ♦♦♦Function*^^ and ♦♦♦three^^^ - ♦♦♦dimensional^^^ 
♦♦♦structure^^^ of ♦♦♦proteins^^^ using ♦♦♦nuclear^^^ 
♦♦ ♦magnetic* ♦♦ ♦•♦resonance*** spectroscopy 
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AB Although the ♦♦♦three^^^ - •••dimensional^^^ structure of a 
♦♦♦protein^^^ can provide valuable information and stimulate rational 
investigation of other important features of the ♦♦♦protein^^^ it is 
important to stress that a structure per se is rarely a revelation of the 
biol. *♦ ♦function* •♦ of the ♦♦♦protein^*^ . This paper emphasizes 
the importance of acquiring results that measure the fundamental phys. 
chem, parameters in ***protein^^^ ♦♦•function^^* events and the 
importance of getting quant, information to support our understanding of 
the link between phys. parameters that describe ♦♦♦function^^^ and the 
biol. relevance of a ♦♦ ♦protein* ♦♦ mol. It is emphasized that 
♦♦♦NMR^^^ spectroscopy, because it combines the ability of measuring 
♦♦♦three*** - ••♦dimensional*** structure and the ability of measuring 
many phys. parameters related to both • • ♦ structure^ ♦ ♦ and 
♦♦♦function^^^ , is one of the key techniques in structural biol. 
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AB We present an automated method incorporated into a software package. 



FOLDER, to fold a * * *protein * * * sequence on a given 



♦three** 



♦dimensional^^^ (3D) template. Starting with the sequence alignment of 
a family of homologous * * * proteins * * * , tertiary structures are modeled 
using the known 3D structure of one member of the family as a template 
Homologous interatomic distances from the template are used as 
constraints. For nonhomologous regions in the model ***protein*** .the 
lower and the upper bounds for the interatomic distances are imposed by 
steric constraints and the globular dimensions of the template, 
respectively. Distance geometry is used to embed an ensemble of structures 
consistent with these distance bounds. Structures are selected from this 
ensemble based on minimal distance error criteria, after a penalty 

♦♦♦function^^^ optimization step. These ♦ ♦♦structures* ♦♦ are then 
refined using energy optimization methods. The method is tested by 
simulating the alpha-chain of horse hemoglobin using the alpha-chain of 
human hemoglobin as the template and by comparing the generated models 
with the crystal structure of the alpha-chain of horse hemoglobin. We also 
test the packing efficiency of this method by reconstructing the atomic 
positions of the interior side chains beyond C beta atoms of a 

♦♦♦protein^** domain from a known 3D stmcture. In both test cases, 
models retain the template constraints and any additionally imposed 
constraints while the packing of the interior residues is optimized with 
no short contacts or bond deformations. To demonstrate the use of this 
method in simulating structures of ***proteins**^ with nonhomologous 
disulfides, we construct a model of murine interleukin (IL)-4 using the 
***NMR*** structure of human IL-4 as the template. The resulting 
geometry of the nonhomologous disulfide in the model structure for murine 
IL-4 is consistent with standard disulfide geometry. 
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prediction of protein structure. 
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AB This tutorial was one of eight tutorials selected to be presented at the 
Third International Conference on Intelligent Systems for Molecular 
Biology which was held in the United Kingdom fi-om July 16 to 19. 1995. 
The authors intend to review the state of the art in the experimental 
determination of protein 3D structure (focus on nuclear magnetic 
resonance), and in the theoretical prediction of protein function and of 
protein structure in 1 D, 2D and 3D from sequence. All the atomic 
resolution structures determined so far have been derived from either 
X-ray crystallography (the majority so far) or Nuclear Magnetic Resonance 
(NMR) Spectroscopy (becoming increasingly more important). The authors 
briefly describe the physical methods behind both of these techniques; 
the major computational methods involved will be covered in some detail. 
They highlight parallels and differences between the methods, and also 
the current limitations. Special emphasis will be given to techniques 
which have application to ab initio stmcture prediction. Large scale 
sequencing techniques increase the gap between the number of known 
proteins sequences and that of known protein structures. They describe 
the scope and principles of methods that contribute successfully to 
closing that gap. Emphasis will be given on the specification of adequate 
testing procedures to validate such methods 



